Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llava : fix tokenization to not add bos after system prompt #3645

Merged
merged 2 commits into from
Oct 16, 2023
Merged

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Oct 16, 2023

We had a bug in adding a BOS token unconditionally on every eval_string call:

https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava-utils.h#L52-L57

https://github.com/ggerganov/llama.cpp/blob/11bff290458f12f020b588792707f76ec658a27a/examples/llava/llava.cpp#L128-L132

This PR fixes that by adding BOS only for the system prompt:

https://github.com/ggerganov/llama.cpp/blob/e0fb74c6ee343f7a4e4a1ca349004c7bb2db99f0/examples/llava/llava.cpp#L129-L131

Some anecdotal testing shows that the performance with the 13B model is now much better:

./llava -m ./models/llava-13b-v1.5/ggml-model-q4_k.gguf --mmproj ./models/llava-13b-v1.5/mmproj-model-f16.gguf --image ~/Downloads/274487006-f051e177-c7a1-49a1-a0c5-a0f0cad87b3a.jpg --temp 0.0 -p 'Please read the text in this image and return the information in the following JSON format (note xxx is placeholder, if the information is not available in the image, put "N/A" instead).\n{"class": xxx, "DLN": xxx, "DOB": xxx, "Name": xxx, "Address": xxx, "EXP": xxx, "ISS": xxx, "SEX": xxx, "HGT": xxx, "WGT": xxx, "EYES": xxx, "HAIR": xxx, "DONOR": xxx}' -e

# master

 {
"class": "California driver's license",
"DLN": "CA DL 123456789",
"DOB": "01/01/1990",
"Name": "John Doe",
"Address": "123 Main St, Anytown, CA 12345",
"EXP": "01/31/2024",
"ISS": "California Department of Motor Vehicles",
"SEX": "Male",
"HGT": "6'2\"",
"WGT": "190 lbs",
"EYES": "Brown",
"HAIR": "Brown",
"DONOR": "No"
}

# PR

{"class": "California", "DLN": "1123456789", "DOB": "08/23/1971", "Name": "Ima Cardholder", "Address": "1234 N Main Street, Anytown, CA 92678", "EXP": "08/23/2014", "ISS": "California", "SEX": "F", "HGT": "5' 3"", "WGT": "123 lbs", "EYES": "Brown", "HAIR": "Black", "DONOR": "Sex F"

@Green-Sky
Copy link
Collaborator

genius discovery!! This should explain why it complained about not seeing any image.

@ggerganov
Copy link
Member Author

Yes, I expect this patch to fix this, but haven't tested yet.

Copy link
Collaborator

@monatis monatis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with the llama.cpp logo and it's much better:

"The image features a white background with a large orange and black logo in the center. The logo is the name "LlamaC++" written in bold letters, likely representing a company or product related to programming or software development. The contrast between the white background and the vibrant colors of the logo makes it stand out prominently in the scene."

Previously it was mentioning about people gathering for an activity.

@monatis
Copy link
Collaborator

monatis commented Oct 16, 2023

Thanks for the fix. Now we can expect a better performance in 7b as well I guess 🚀

@Green-Sky
Copy link
Collaborator

I am trying to replicate the previous results with manually specifying the old 1234 seed, but the result is very different every time, so there might be a bug somewhere (not in this pr).

@monatis
Copy link
Collaborator

monatis commented Oct 16, 2023

@Green-Sky Try with the last commit.

@Green-Sky
Copy link
Collaborator

@monatis thanks, will also have to patch master though.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 16, 2023

I thought it was necessary every time the after the system prompt was evaluated.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 16, 2023

rata-topo-desnuda_103

Question: What animal do you see in the picture?

Model 7B:

I see a large, hairless animal in the picture.

I expected naked mole rat or something.

Copy link
Collaborator

@Green-Sky Green-Sky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some testing, but the original "i dont see an image" was already fixed, likely by the system prompt updates @monatis did.
The responses do look better now.

@Green-Sky
Copy link
Collaborator

Green-Sky commented Oct 16, 2023

I thought it was necessary every time the after the system prompt was evaluated.

Yea, maybe, but before this pr it was clearly wrong, since it inserted BOS inbetween USER: and <user prompt here>

@monatis
Copy link
Collaborator

monatis commented Oct 16, 2023

@FSSRepo 13b in f16 (temperature = 0.2):

" In the image, I see a hairless mole with long nails."
The hyperparameters Above are most comparable ones to the online demo to my best knowledge.

@FSSRepo
Copy link
Collaborator

FSSRepo commented Oct 16, 2023

@FSSRepo 13b in f16 (temperature = 0.2):

I'm using the ggml-model-7b-q4_k.gguf and temperature = 0.1

@monatis monatis merged commit 940efa9 into master Oct 16, 2023
@jxy
Copy link
Contributor

jxy commented Oct 17, 2023

This fix is wonderful. Now it works fine even with vicuna-13b-v1.5-Q8_0.gguf, instead of the finetuned llava model.

With the naked mole rat image above, it gives,

The animal in the picture is a naked mole-rat, also known as a hairless mole. It is a small mammal with a long, pointed nose and a hairless body. The naked mole-rat is native to Africa and is known for its unique social behavior and ability to live in large colonies.

And with the DL image, and the json prompt,

{
"class": "DLN",
"DLN": "123456789",
"DOB": "01/01/1980",
"Name": "Iris Cardholder",
"Address": "123 Main St, Anytown, CA 123456",
"EXP": "08/15/2023",
"ISS": "CA",
"SEX": "F",
"HGT": "5'6\"",
"WGT": "125 lbs",
"EYES": "Brown",
"HAIR": "Black",
"DONOR": "N/A"
}

@Green-Sky
Copy link
Collaborator

Now it works fine even with vicuna-13b-v1.5-Q8_0.gguf, instead of the finetuned llava model.

how. @haotian-liu how much does the projection matrix and the finetuned clip vit contribute in comparison to the llm ?

@monatis
Copy link
Collaborator

monatis commented Oct 17, 2023

CLIP itself is not finetuned in LLaVA. I think the model @jxy is referring to is this model which undergoes further finetuning for insstruction following to become LLaVA V1.5 to my knowledge.

Only the LLM part is finetuned, which is bridged to frozen CLIP layers via a two-layer MLP (multimodal projector) during pretraining.

@monatis monatis deleted the fix-llava branch October 17, 2023 13:09
@jxy
Copy link
Contributor

jxy commented Oct 17, 2023

CLIP itself is not finetuned in LLaVA. I think the model @jxy is referring to is this model which undergoes further finetuning for insstruction following to become LLaVA V1.5 to my knowledge.

I used the actual vicuna v1.5, from https://huggingface.co/TheBloke/vicuna-13B-v1.5-GGUF/resolve/main/vicuna-13b-v1.5.Q8_0.gguf

You may try it yourself. I also tried other 13B models, they all kind of works.

@Green-Sky
Copy link
Collaborator

Fascinating that the embedding projection matrix does so much by itself.

@haotian-liu
Copy link

@ggerganov Cool! Thanks for identifying and fixing this. The model now works much better.

@Green-Sky

After the first-stage pretraining, LLaVA can actually answer some basic questions quite well, even following the format instructions (Table 6 in llava-1.5 paper).

But there are some tasks that it is not yet good at doing, like reasoning about the relationship between the objects in the scene. For these tasks, it would probably require the finetuned LLM as well.

joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 19, 2023
* 'master' of github.com:ggerganov/llama.cpp:
  fix embeddings when using CUDA (ggml-org#3657)
  llama : avoid fprintf in favor of LLAMA_LOG (ggml-org#3538)
  readme : update hot-topics & models, detail windows release in usage (ggml-org#3615)
  CLBlast: Fix temporary buffer size for f16 conversion (wsize)
  train-text-from-scratch : fix assert failure in ggml-alloc (ggml-org#3618)
  editorconfig : remove trailing spaces
  server : documentation of JSON return value of /completion endpoint (ggml-org#3632)
  save-load-state : fix example + add ci test (ggml-org#3655)
  readme : add Aquila2 links (ggml-org#3610)
  tokenizer : special token handling (ggml-org#3538)
  k-quants : fix quantization ranges (ggml-org#3646)
  llava : fix tokenization to not add bos between image embeddings and user prompt (ggml-org#3645)
  MPT : support GQA for replit-code-v1.5 (ggml-org#3627)
  Honor -ngl option for Cuda offloading in llava (ggml-org#3621)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants